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ABSTRACT 

We investigate how well the 3D density field of neutral hydrogen in the Intergalactic 
Medium (IGM) can be reconstructed using the Lyman-a absorptions observed along lines of 
sight to quasars separated by arcmin distances in projection on the sky. We use cosmological 
hydrodynamical simulations to compare the topologies of different fields: dark matter, gas 
and neutral hydrogen optical depth and to investigate how well the topology of the IGM can 
be recovered from the Wiener interpolation method implemented by Pichon et al. (2001). The 
global statistical and topological properties of the recovered field are analyzed quantitatively 
through the power-spectrum, the probability distribution function (PDF), the Euler charac- 
teristics, its associated critical point counts and the filling factor of underdense regions. The 
local geometrical properties of the field are analysed using the local skeleton by defining the 
concept of inter-skeleton distance. 

As a consequence of the nearly lognormal nature of the density distribution at the scales 
under consideration, the tomography is best carried out on the logarithm of the density rather 
than the density itself. At scales larger than ~ 1.4(g?los)> where (c?los) is the mean separation 
between lines of sight, the reconstruction accurately recovers the topological features of the 
large scale density distribution of the gas, in particular the filamentary structures: the inter- 
skeleton distance between the reconstruction and the exact solution is smaller than (cZlos)- 
At scales larger than the intrinsic smoothing length of the inversion procedure, the power 
spectrum of the recovered Hi density field matches well that of the original one and the low 
order moments of the PDF are well recovered as well as the shape of the Euler characteristic. 
The integral errors on the PDF and the critical point counts are indeed small, less than 20% for 
a mean line of sight separation smaller than ^2.5 arcmin. The small deviations between the 
reconstruction and the exact solution mainly reflect departures from the log-normal behaviour 
that are ascribed to highly non-linear objects in overdense regions. 

Key words: methods: statistical, hydrodynamical simulations - cosmology: large-scale struc- 
tures of universe, intergalactic medium - quasars: absorption lines 



1 INTRODUCTION 



The structure and composition of the intergalactic medium (IGM) 
has long been studied using the Ly-a forest in QSO absorp- 
tion spectra (Rauch 1998). The progress made in high resolution 
Echelle-spectrographs has led to a consistent picture in which the 
absorption features are related to the distribution of neutral hy- 
drogen through the Lyman transition lines of Hi. Hydrogen in the 
IGM is highly ionized (Gunn & Peterson, 1965). Its photoioniza- 
tion equilibrium in the expanding IGM establishes a tight corre- 
lation between neutral and total hydrogen density and numerical 
simulations have confirmed the existence of this correlation. They 
have also shown that the gas density traces the fluctuations of the 
DM density on scales larger than the Jeans length (see for example 



Cen et al. 1994, Petitjean et al. 1995, Miralda-Escude et al. 1996, 
Theuns et al. 1998, Viel, Haehnelt & Springel 2004). 

As we will show in the first part of this work, the statistical 
and topological properties of the IGM and of the dark matter dis- 
tributions are the same, so that recovering the three-dimensional 
distribution and inferring the topological properties of the IGM al- 
lows us to constrain the properties of the dark matter distribution as 
well. 

Although topological tools have been introduced only rela- 
tively recently in cosmological analysis, they have been used ex- 
tensively to characterize the topology of large scales structures as 
revealed by the three-dimensional distribution of galaxies in the 
local universe (see for exemple Gott et al. (1986), Vogeley et al. 
(1994), Protogeros & Weinberg (1997), Trac et al. (2002), Park et 
al.(2005) and Sousbie et al. (2006) for the topological analysis of 
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galaxy surveys). The outcome of such an analysis is a quantita- 
tive description of the complex appearance of the distribution of 
the matter in the universe, with its network of clump, voids, fila- 
ments and sheet-like structures. The study of the topology using 
galaxy surveys is attractive because of their large volume and the 
huge number of objects they contain. However the clustering of 
highly non linear objects (galaxies, clusters of galaxies or QSOs) is 
biased compared to the underlying clustering of dark matter fluc- 
tuations that we wish to constrain (see Kaiser 1984). This biasing 
results from a complicated and delicate competition between a va- 
riety of processes which are often too complicated to be tractable 
analytically. Besides, the maximum redshift in surveys is low (in 
the analysis of the SDSS data made by Park et al. 2005, the max- 
imum redshift is z = 0.1654), so that this kind of analysis can be 
done only in the local Universe, where the fluctuations have already 
entered the highly non-linear regime. 

Given the strong correlation existing between dark matter 
distribution and the low-density intergalactic medium, one could 
probe the underlying distribution of matter via the signature pro- 
duced by diffuse hydrogen in quasar spectra, namely absorption 
features observed in the Ly-a forest. Indeed, absorption spectra 
provide a picture complementary to those drawn by galaxy surveys 
to infer the large scale distribution of the matter in the Universe, 
since the absorption features produced by the IGM in Ly-a forest 
can be detected also at large redshift and since the IGM probes the 
low density range, whereas the galaxy distribution does not. Even- 
tually, higher density contrasts can be recovered from the analy- 
sis of the Ly-a forest if higher order transitions are included in the 
analysis; for example, the Ly-/3 transitions should allow us to probe 
density contrasts up to 8 w 15. 

The flux along a single line-of-sight towards a quasar only 
provides one dimensional information, which can be used to con- 
strain the fluctuation amplitude and the matter density (Nusser & 
Haehnelt 1999, Rollinde et al. 2001, Zaroubi et al. 2006). The trans- 
verse information, found in pairs of quasars, has been used to study 
the extension of the absorbing regions (e.g. Petitjean et al. 1998; 
Crotts & Fang 1998; Young, Impey & Foltz 2001; Aracil et al. 
2002) and the geometry of the Universe at z ~ 2 (Hui, Stebbins 
& Buries 1999; McDonald & Miralda-Escude 1999; Rollinde et al. 
2003, Coppolani et al. 2006). 

Given a set of lines of sight (LOSs) toward a group of QSOs 
with small angular separation, inversion methods can be used to 
recover the three-dimensional distribution of low density gas, as 
demonstrated in Pichon et al. (2001). They showed that the visual 
characterization of the density field (with its network of filaments, 
clumps, voids and pancakes) is correctly reproduced if the mean 
separation between the LOSs is less than (c^os) < 5 Mpc. 

In this paper we test quantitatively whether such an inversion 
can recover the global properties of connectivity of the density 
field, using topological tools such as the Euler characteristic and 
the probability distribution function. 

The paper is organized as follows. In Section 2, the Euler char- 
acteristic is defined as an alternate critical points count and imple- 
mented for a Gaussian field. The difference between the topolog- 
ical properties of the dark matter, of the gas and of the observed 
optical depth is then discussed using outputs of a hydrodynami- 
cal simulation (Section 3) and relying on different statistical tools. 
In Section 4, the ability to reconstruct the global topology of the 
three dimensional distribution from a simple Wiener interpolation 
of a discrete group of lines of sight is considered. Finally, Section 5 
summarizes the results of this paper and discusses some possible 



improvements of the method as well as observational constrains 
from future surveys. 



2 THE EULER CHARACTERISTIC: AN ALTERNATE 
CRITICAL POINT COUNT 

This paper makes use of various statistical tools, namely the PDF, 
the Euler characteristic, the skeleton and related estimators such 
as the first cumulants of the PDF (connected moments), critical 
point counts and the filling factor, to characterize the topology of 
the large scale density distribution. These tools will also be used 
to test the efficiency of reconstructing the density field from a grid 
of QSO sight-lines and in particular the ability to reproduce the 
connectivity of the large scale structures. 

Following Colombi, Pogosyan & Souradeep (2000, hereafter 
CPS), this section introduces the Euler characteristic, \ + , as an al- 
ternate critical point count in an overdense excursion with density 
contrasts larger than a threshold 5th- It is shown how the behavior 
of x + is related to connectivity in the field. The numerical imple- 
mentation used to measure it is described and tested on Gaussian 
random realizations. 

2.1 Definition of the Euler characteristic 

Let 5(x) be a scalar function defined in a 3D volume V. Given a 
threshold value 5th, consider the excursion set E + formed by the 
points x with 5(x) > 5th, as expressed by the following equation: 

= {x|<S(x) ><5th}. (1) 

The analysis of the geometrical properties of points that belong to 
the excursion set E + as a function of 5th gives information about 
the global topology of the scalar field 5(x) and allows for the char- 
acterization of large scale structures. 

A simple qualitative link can be established between the dis- 
tribution of critical points (defined by V5 = 0), a on the one hand, 
and connectivity on the other, which are related to local and global 
properties of the excursion set respectively. If one considers over- 
dense regions, connectivity happens along ridges (filaments) pass- 
ing through saddle points and connecting local maxima. The same 
reasoning can be applied to under-dense regions where minima are 
connected through tunnels (pancakes) via another kind of saddle 
point. This idea is in fact supported on rigorous grounds by Morse 
theory (see Milnor 1963). The Morse theorem establishes the link 
between the distribution of critical points and the global connectiv- 
ity of the excursion set, via the Euler characteristic. This quantity 
represents the integral of the Gaussian curvature over an iso-density 
surface that marks the boundary of the excursion set (see for exem- 
ple Gott, Melott & Dickinson, 1986). It is usually defined as the 
following count (see for example Mecke, Buchert & Wagner, 1994, 
for details): 

X + = connected components — tunnels + cavities. (2) 

According to Morse theorem, it can also be expressed as a linear 
combination of the number of critical points of different types that 
are found in the excursion set as a function of 5th- 

To be more specific, let us consider the critical points of the 
field. For these points, the Hessian matrix, whose components are 
given by: 

Hi = 9H , (3) 
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Figure 1. Mean Euler chai'acteristics, x ( see Eqs. 4 and 6), for a Gaus- 
sian random field (GRF, points with small errorbars) as a function of the 
density threshold, <5/o\ compared to the theoretical prediction (Doroshke- 
vich, 1970, smooth curve). The mean is carried over 5 realizations of a GRF 
whose power spectrum is given by a power-law with spectral index n = — 1 
on a 256 3 grid, while additional smoothing is performed with a Gaussian 
window of size 5 pixels. 



Figure 2. Evolution of the number of critical points entering the computa- 
tion of the Euler characteristic for the GRF considered in Fig. 1 . The frac- 
tion of different types of critical points above the threshold is plotted as a 
function of 8 /a and each distribution is compared to the analytical predic- 
tion. Again, symbols with error bars represent the mean over 5 realizations 
of the same GRF, while the smooth curves give the analytical prediction 
(which can be easily derived from Bardeen et al. 1986). 



is calculated and its eigenvalues are estimated. According to the 
number of negative eigenvalues, /, of the Hessian matrix, the lo- 
cal structures of the field can be classified in the following way: a 
clump, a filament, a pancake and a void corresponding to I =3, 
2, 1 and respectively. The Morse theorem states that the Euler 
characteristic can be expressed as a count of the number of critical 
points belonging to each of these four classes: 

X + = Ni =3 - Ni =2 + Ni = i - Ni =0 , (4) 

where Ni=i is the number of critical points with i negative eigen- 
values. With this approach, it is sufficient to determine the number 
distribution of the four kinds of critical point. However this dif- 
ferential method requires the field under consideration to be suffi- 
ciently smooth and non degenerate. To this end, in the subsequent 
analyses of this paper, the field will be smoothed with a Gaussian 
window (using standard FFT technique), 

^ (r)= (2^jW eXP (~i)' (5) 

of sufficiently large size L a compared to the sampling grid pixel 
size in order to minimize the impact of numerical artefacts com- 
ing from the discretization of the field on a grid (see, e.g., CPS 
for a thorough analysis of measurement issues). In what follows, 
the smoothing scale used to measure the Euler characteristic is al- 
ways larger than 0.01 x Af p i x grid pixels where Apix = 256 pixels 
is the box resolution of the simulations. In principle this smooth- 
ing scale is large enough to have an unbiased measurement of the 
Euler characteristic. The prescription of CPS is used to detect and 
classify critical points. This method involves locally fitting a sec- 
ond order hypersurface on the smoothed density field, while taking 
into account each point on the grid under consideration and its 26 
neighbors. 



2.2 Interpretation of the Euler characteristics 

For clarity, let us recall here the interpretation of the shape of the 
Euler characteristic as a function of density threshold (CPS). Let us 
first study the simple case of a Gaussian random field (GRF). The 
analytic predictions for a GRF are given, for example, in Doroshke- 
vich (1970) (see also, Schmalzing & Buchert 1997). In what fol- 
lows, a slightly different normalization from Eq. (4) is used: the 
volume independent quantity 

X + =X + /Ntot, (6) 

where iVtot = • Ni=i is the total critical point count in the vol- 
ume considered, in the limit <5th — > — oo. 

In Figure 1 , the numerical estimates of the Euler characteristic 
are given as a function of the density threshold 5th = 8/cr where 
5 = (p — p)/p is the density contrast and a = rms(<5) from five 
Gaussian random field (GRF) realizations (points with error bars) 
whose power spectrum is given by a power-law with spectral index 
n = — 1, i.e. P(k) oc k n . The result is compared with the ana- 
lytic prediction (solid line). The shape of the curve as a function of 
the density threshold can be understood from Equation (4) and Fig- 
ure 2 which displays the critical point counts. At very low values 
of the threshold (S/a < — 4) the excursion set includes almost all 
points and, due to the symmetry between high and low-density re- 
gions, the number of minima (pancakes) compensates the number 
of maxima (filaments) so that the Euler characteristic approaches 
zero. When the value of the threshold is increased, local minima 
first drop out of E + , creating cavities and thus increasing the value 
of x + ■ At 8 /a > — 2, pancakes start to drop out too and cavities 
connect together, thus the value of x + decreases, reaching its mini- 
mum at S/a « 0. In the range < 5 /a S; 2 filaments also drop out, 
breaking up the ridges to create isolated clusters, thus increasing 
X + again. Finally in the region 5/cr > 2 only clumps are found to 
lie in the excursion set, but they are progressively lost as the thresh- 
old increases, explaining the final decrease of the curve. 
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This simple analysis shows how the features seen in the Euler 
characteristic are closely related to the network of filaments and 
pancakes that connect clumps and voids. 



3 FROM DARK MATTER TO OPTICAL DEPTH 

The Lyman-a absorption lines observed in QSOs spectra and pro- 
duced by the Hi structures intercepted by the line-of-sight can 
be used to study the topology of the Universe at high redshift 
(z > 1 — 2). However, the information derived from observations 
of QSOs spectra is more directly related to the Hi optical depth, 
whereas here the aim is to constrain the underlying dark matter 
density field for which theory makes direct predictions. Hence, one 
has to rely on simulations in order to calibrate the relation between 
the density field of the neutral hydrogen and that of the dark matter. 

In this section we first present the hydrodynamical simulations 
used in the present work and we then analyse the shape of the PDF 
and the Euler characteristic of the three density fields (dark matter, 
gas and Hi) and of the optical depth field, explaining how these 
curves are related. 



Bouchet & Schaeffer 1994) and the Euler number (see, e.g., CPS). 
For the reconstruction, the typical separation (g(los} between lines 
of sight defines a natural smoothing scale L B ~ (^los)- Note that, 
unfortunately, the upper bound of L s ~ 4 Mpc corresponds to 
a lower bound on (^los) in current state of the art observations 
(Rollinde et al. 2003), but one can expect to lower this limit in fu- 
ture surveys (Theuns & Srianand 2006). Hence the following anal- 
yses are performed in the range 2 Mpc < L s < 4 Mpc. 

3.2 PDF and Euler characteristic of physical density fields 

It this Section we compare the large-scale distribution and the topo- 
logical properties of the different density fields (dark matter, total 
amount of gas and neutral gas) by looking at their probability dis- 
tribution function (PDF) and their Euler characteristic (x + )- Our 
knowledge of the physics of the intergalactic medium is used to 
perform the analysis and to link the distributions of Hi and H. In- 
deed, the observations give access to the Hi optical depth through 
absorption spectra. We also consider thermal broadening and red- 
shift distortion effects. 



3.1 Numerical simulations 

We analyse a cosmological hydrodynamical simulation that evolves 
both dark matter particles and a gaseous component to study the 
global topology of the intergalactic medium at redshift z — 2. The 
dynamical evolution and the physical properties of the gas and the 
of the Hi component are calculated taking into account the heat- 
ing and cooling processes and the effect of the ionizing UV back- 
ground in a standard way. The corresponding Particle-Mesh (PM) 
code used to perform the simulation is described in detail in Cop- 
polani et al. (2006). 

In this run, the standard ACDM model is assumed with a set 
of cosmological parameters given by: f2 m = 0.3 and J1a = 0.7, 
while the assumed baryon density is = 0.04. The Hubble con- 
stant is Ho = 70kms~ Mpc -1 and the amplitude of the fluctu- 
ations of the matter density field in a sphere of radius 8 ftT 1 Mpc 
is as = 1. While the other cosmological parameters are roughly in 
agreement with recent observational constraints, the value of as is 
somewhat large compared to the value suggested by WMAP (see 
Spergel et al., 2007). However, this should not have any incidence 
on the results derived in this paper. 

The simulation involved 512 3 dark matter particles in a box 
with periodic boundary conditions of comoving size L box = 40 
Mpc. The gaseous component was also followed on a 512 3 grid 
which was used to compute gravitational forces. Although this sim- 
ulation marginally resolves the Jeans length of the gas, Coppolani 
et al. (2006) checked with higher resolution runs that numerical 
convergence was achieved at small scales. 

Although 512 3 grid points were available, this resolution was 
degraded to a 256 3 resolution (using standard donner cell pro- 
cedure), in order to make the calculations more tractable. Obvi- 
ously this additional smoothing makes the effects of subclustering 
within the Jeans length irrelevant. Therefore the gaseous compo- 
nent should be nearly indistinguishable from the dark matter com- 
ponent. 

The main limit in these analyses remains the box size, which 
is still small and only allows for a fair statistical measure at scales 
L s larger than I/ max ~ L box /10, i.e. 4 Mpc. Indeed finite volume 
effects are known to become significant for L s > Z/ max for stan- 
dard statistics such as the probability function (see, e.g., Colombi, 



3.2. 1 From dark matter to Hi: IGM equation of state 

It is well known that on scales larger than the Jeans length the dis- 
tribution of the gas follows the distribution of dark matter, so that 
their statistical and topological properties are expected to be the 
same at these scales. This is checked by comparing the PDF and 
the Euler characteristic of the two density fields smoothed using 
different values of L s , as shown in Figs. 3 and 4. Note the agree- 
ment of the PDFs and of the Euler characteristics of the two fields 
for all values of the smoothing scale considered, a result which can 
be expected since the scaling regime probed is largely above the 
Jeans length of the gas. 

The comparison of the distribution of the neutral gas (Hi) with 
that of the total amount of gas and the dark matter calls for a slightly 
more elaborate approach, given the non-linearity involved in the 
expression that relates the distribution of the gas to the distribution 
of Hi. In fact, numerical simulations support the idea that a tight 
correlation exists between neutral and total hydrogen density (Cen 
et al. 1994, Miralda-Escude et al. 1996, Theuns et al. 1998, Viel, 
Haehnelt & Springel 2004). This correlation is expected to follow 
a power-law of the form 

Pgas « A ■ ( Pm ) a . (7) 

We thus introduce here a new density field pa defined as the right- 
hand-side of Equation (7), so that pm = A- (pm) a . In what follows 
this new density field will be used in order to approximate the den- 
sity of the gas. However, Equation (7) is not fulfilled in the whole 
range of pn\ values. 

To illustrate this, Fig. 5(a) displays the gas density distribution 
(top), with its network of filaments outlined in the left panel with 
a contour corresponding to 8 = 1, and the temperature distribution 
in units of 10 4 K (bottom) for which we have drawn the contour 
corresponding to (T/10 4 ) = 2 in the left panel. Note that along 
filaments and at their intersection the gas is hot. This indicates that 
shock waves propagate along filaments, rising the temperature and 
ionizing the gas. This is confirmed by Fig. 5(b) which shows the 
ratio R = pm/pgas measured directly in the 256 3 grid (top), and 
after a Gaussian smoothing with a window whose size is L s = 2.2 
Mpc (bottom). In both cases, the panels on the left show the con- 
tours relative to R = 0.7. To complete the picture, let us consider 
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Figure 3. Probability distribution function of density fields at different smoothing scales (from left to right, top to bottom, no smoothing, L s =1.6, 2.2 and 3.6 
Mpc). The thick solid, thick dashed and thin solid curves correspond to dark matter, gas and Hi (rescaled according to Eq. 7), respectively. The dotted curve is 
a best fit of a lognormal distribution to the thin solid curve, showing that all these PDFs are reasonably close to lognormal, a property that will be useful for 
the reconstruction. In the unsmoothed case, the gas and H I PDFs match very well for 1 + <5 ~ 1 but depart from each other at higher density. The apparent 
very good match in the unsmoothed case comes from the fact that the un-shocked part of the intergalactic medium totally dominates the part of the PDF which 
is visible in this panel. The match between Hi and gas PDFs decreases with increasing smoothing scale, due to "mixing effects", as explained in the main text. 
Note finally that the dark matter is not displayed in the unsmoothed panel because the result would be contaminated by the cloud-in-cell interpolation used to 
compute the density on the grid. 
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Figure 4. Same as in Fig. 3 but for the measured Euler characteristic as a function of density threshold at different smoothing scales. Again, the thick solid, 
thick dashed and thin solid curves correspond respectively to dark matter, gas and Hi (rescaled according to Eq. 7 with the values (A, a) given in Table 1). 
While the curves for the dark matter and for the gas superpose exactly at all smoothing scales and all values of the threshold, the Hi, even after the scaling is 
applied, behaves in a different way in the high density region. As explained in the text, this is a consequence of the presence of shocks and condensed objects, 
whose effect is a change in connectivity properties. 
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Figure 5. Top: Gas density and temperature (in units of 10 4 K) spatial distributions in a one pixel (as 15.6 kpc) slice. The intensity of the fields is color-coded 
with the scale given on the right. The panels on the left give the contours corresponding to <5 = 1 for the density and to T/10 4 = 2 for the temperature. 
Bottom: For the same slice as above we show on the right the spatial distribution of the ratio R = pm/Pgas f° r me unsmoofhed field (up) and for the field 
smoothed with a Gaussian window of size (FWHM) L B = 2.2 Mpc (down). The color scale is such that darker regions correspond to low values of R. On the 
left the contours correspond to R = 0.7. 



Fig. 6 which shows the scatter between p gas and pm for different 
smoothing scales, as indicated in each panel. As expected, the tight- 
ness of the correlation is very high in underdense and moderately 
dense regions, but shock heating on the one hand and the forma- 
tion of condensed objects on the other produce a significant scatter 
(where R < 1) along densest filaments and at their intersection (in 
clusters). For the purpose of the reconstruction, some smoothing is 
required. Unfortunately, smoothing also mixes these regions with 
the un-shocked part of the intergalactic medium. This is confirmed, 



in a qualitative way, by the slices shown in Fig. 5(b). More quantita- 
tively, for the fields R shown in Figure 5(b) we have calculated the 
fraction of the volume occupied by the regions with R < 0.7 (i.e. 
the volume of the regions enclosed by the contours in the left pan- 
els). For the slice shown, this fraction is f(R < 0.7) = 0.07 and 
f(R < 0.7) = 0.19 for the un-smoothed and the smoothed case re- 
spectively, while when the whole three dimensional boxes are con- 
sidered, the fraction of volume occupied by shocked regions are 
f(R < 0.7) = 0.05 and f(R < 0.7) = 0.11. As a result of such 
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Figure 6. Scatter plots displaying the relation between the gas density and the Hi density at different smoothing scales. The dashed black lines in each panel 
represent the best fit, following Eq. (7), with the parameters (A, a) given in Table 1. Note that the dispersion increases when the smoothing scales increases, 
due to the mixing effect discussed in Figure 5. 





A 


a 


Unsmoothed 


1.275 


0.63 


1.6 Mpc 


0.915 


0.56879 


2.2 Mpc 


0.85 


0.55209 


3.6 Mpc 


0.795 


0.5389 



Table 1. Values of the parameters A and a entering in the scaling relation 
between gas and Hi (see Eq. 7) as a function of the smoothing scale L B . 



mixing, the tightness of the correlation is weakened, but remains 
good as shown in Fig. 6. However, the best fit values of the pa- 
rameters A and a changes slightly when the field is smoothed (see 
Table 1). We fit these values with the low density tail of the PDF 
(see Figure 3). As expected, the higher density tail match worsens 
with smoothing. 

Given that the scaling relation (7) is monotonous, should it 
apply exactly, the topology of the neutral gas should be exactly 
the same as of the total gas/matter distribution. However, given 
the dispersion of this relation, one expects the Euler characteristic 
of the /5hi field to depart from that of p gas for large density con- 
trasts. This is confirmed by Fig. 4: a nearly perfect agreement is 
found between the gas and Hi for <5 < 0, while differences become 
significant at larger values of the density contrast. Increasing the 



smoothing length (i.e. going from left to right in Fig. 4) worsens 
the match, as expected, but this is in part lost in the noise due to 
finite volume effects. Note that \ + measured in Hi is, in the 8 > 
regime, more peaked than for the total gas. This agrees with intu- 
ition, since galaxies form in filaments: in these highly condensed 
objects, gas concentrates and cools down. Hence the Hi density be- 
comes significant again inside these clumps, but is depleted in their 
surroundings due to shock heating as can be seen from Fig. 5. The 
resulting distribution of Hi in filaments is therefore expected to be 
more clumpy than the total gas i.e. less efficiently connected, re- 
sulting in a larger increase of \ + f° r 5 > 0. The estimates made 
inside filaments are however certainly not free of numerical arte- 
facts since they are limited by the simulation's spatial resolution 
(following accurately the formation of condensed objects requires 
much higher spatial resolution than our simulation). Therefore al- 
though one can definitely trust the 5 < measurements, the results 
derived for 8 > are likely to yield the right qualitative behavior, 
but are certainly quantitatively biased. 



In the next Sections, the 8 > disagreement will be ignored 
and it will be assumed that the scaling relation (7) is always valid, 
keeping in mind the limitation of such an assumption. Hence, re- 
constructions will be performed on the optical depth without at- 
tempting to directly recover the gas distribution. 
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Figure 7. The effect of redshift distortion on the HI density. The same slice (whose width is 6 pixels, corresponding to 0.94 Mpc) of the Hi density contrast is 
shown without distortion (left panels) and with distortion (right panels, using the infinitely distant observer approximation with a distortion along the x axis) 
in the case where the fields are not smoothed (top panels) and when the fields are smoothed at L B = 1.6 Mpc (bottom panels). 




Figure 8. Effect of redshift distortion on the Euler characteristic of Hi at different smoothing scales: L s = 1.6 Mpc, 2.2 Mpc and 3.6 Mpc. The solid black 
line is for Hi without any distortions, the dashed yellow line has been obtained by including only the effect of peculiar velocities, while for the red thin line 
both redshift distortion and thermal broadening are taken into account. 



3.2.2 From Hi to optical depth: redshift space distortions and 
thermal broadening 

In the above discussion we argued that the main features of dark 
matter topology, as traced through the Euler characteristic and the 
probability density function, can be recovered through the topol- 
ogy of the Hi for small density contrasts, 5 < 0. However, along a 
line-of-sight, the optical depth is in fact observed in redshift space, 



where distortions induced by the peculiar motions operate. More- 
over, the profiles of absorption lines are broadened at small scales 
by the effect of the temperature. Since the thermal broadening is 
important only at scales of the same order or smaller than the Jeans 
length, this second effect should be negligeable in the scaling range 
considered in this paper, since it will be swept out by the smooth- 
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ing. On the contrary, redshift distortion should a priori not be ne- 
glected. 

In theory, it is possible to partially correct for redshift distor- 
tion effects (see for instance PVRCP). However, the corresponding 
treatment of the peculiar velocities involves a simultaneous decon- 
volution of the Hi density field with the velocity field on top of the 
inversion discussed in next section. This requires not only a prior 
for the density field but also for its correlation with the peculiar ve- 
locity field and makes the inversion quite convolved and this would 
go beyond the scope of this paper. In what follows, it is shown that 
in fact redshift distortions have a small effect on the topology of the 
overall density distribution for the probed scales; they shall thus be 
neglected in the reconstruction part of this work. Moreover, one 
of the interesting outcomes of the reconstruction is to predict the 
positions of filaments in the three dimensional matter distribution. 
Cross-correlation of such a distribution with for instance the ob- 
served distribution of galaxies at high redshift can in fact be also 
performed in redshift space. 

Figure 7 displays the Hi distribution in real and redshift space 
with and without smoothing: the main effect of redshift distortion 
on Hi is an enhancement of large scale density contrasts orthog- 
onally to the line of sight due to large scale motions (this is the 
so-called Kaiser effect, e.g. Kaiser 1987): the "voids" (underdense 
regions) are more pronounced, and the filaments orthogonal to the 
LOS are more contrasted. There is as well a small scale "finger of 
god" effect, due to internal motions inside large dark matter haloes, 
but it is not very pronounced at such a high redshift, and is in ampli- 
tude of the same order of thermal broadening. Note however, that 
non trivial shell crossings can still occur, e.g. two filaments cross- 
ing each other thanks to peculiar velocities, but this effect remains 
small, and is clearly damped out by smoothing; after smoothing 
only the Kaiser effect remains. 

These qualitative arguments are illustrated by Figure 8. The 
measured Euler characteristics before and after redshift distortion 
differ only slightly. When the redshift distortion is taken into ac- 
count, for 8 < 0, a shift towards the left is induced (dashed curve) 
as compared to the non-distorted case (solid one); while the oppo- 
site occurs for 5 > (although in the latter case, the effect seems 
to be nearly insignificant). This shift remains quite small as argued 
before. Note as well that thermal broadening (thin curve) is totally 
negligeable. 

Finally, one last point should be mentioned. When one con- 
siders real absorption spectra, instrumental noise has to be taken 
into account in the analysis. This noise, combined with saturation 
of the flux of the Ly-a absorption lines arising in high density re- 
gions (with 8 > 10) can complicate the interpretation of the mea- 
surements. In this case, some of the information about the intensity 
of the density field cannot be recovered, unless, say, Lyman-/? is 
also available. In this work, however, the main interest lays in re- 
producing the low-density part of the Hi distribution, for which the 
relation (7) holds and for which the topology of the underlying dark 
matter distribution is theoretically constrained. In this regime, the 
Ly-a lines are not saturated, thus a complete treatment of satura- 
tion effects in high density regions is not required for the aim of the 
present work. 



4 TOPOLOGICAL AND STATISTICAL PROPERTIES OF 
THE RECOVERED FIELDS 

The absorption spectrum towards a quasar gives access to one- 
dimensional information i.e. the optical depth along the line of sight 



(LOS) towards the QSO. However, if a set of LOSs towards a group 
of quasars is available, the information along each LOS can be in- 
terpolated to construct a 3-dimensional optical depth field. 

In this Section we first briefly outline the inversion technique 
implemented to recover the optical depth and describes how to set 
the parameters that enter the inversion procedure. We then check 
how the reconstruction performs by measuring various statistical 
quantities, in particular the PDF of the density field and its Euler 
characteristic. As argued in the previous Section, the focus is on 
the optical depth: no attempt is made to recover the gas or dark 
matter distribution directly. Thermal broadening, redshift distortion 
and effects of saturation or instrumental noise are neglected. Given 
these assumptions, studying the optical depth distribution is then 
equivalent to studying the Hi density distribution, pui- 



4.1 The inversion method: Wiener interpolation 

The technique used to interpolate the optical depth field between 
lines of sight is described and discussed in details in PVRCP. 

Let D be a 1 -dimensional array representing the data set (i.e. 
the values of 7los = ln(pHi) along the LOSs, which we assume 
to be parallel to each other); we call M the 3-dimensional array of 
the parameters that need to be estimated (here the values of 730 = 
In(pHi) in the 3-dimensional volume) by fitting the data. Wiener 
interpolation reads (see Eq. 20 of PVRCP), assuming the noise is 
uniform and uncorrelated, 

M = Cmd-(C dd + N)- 1 -D, (8) 

where N = n 2 I is the diagonal noise contribution, Cmd is the 
mixed parameters-data covariance matrix and Cdd is the data co- 
variance matrix: 

Cmd — C 73D7LOS , Cdd = C 7LOS7LOS . (9) 

Here an ad-hoc prior is used and a Gaussian shape for the co- 
variances is assumed. In this cases the matrices C 73D7LOS and 
C 7los7los ^ g iven b Y 

C(xi,a;2,xix,X2x) = a x cxp ^ Jx 

/ |xi ± -x 2 ±| 2 \ 

exp l z? — )' (10) 

where (xi, X;x) represents the coordinates of the points along and 
perpendicular to the LOSs respectively, L x and Lt are correlation 
lengths along and perpendicular to the LOSs, while a 2 quantifies 
the typical a priori fluctuations in a volume of size L x x L T . The 
meaning and choice of these parameters will be discussed further 
in § 4.2. 

Note that the shape of the covariance matrix can be calculated 
with a more sophisticated approach. This would involve the use of 
theoretical priors relying on our knowledge of large scale struc- 
ture dynamics. If, for instance, the reconstruction was performed 
on the pure dark matter density contrast, one could be tempted to 
derive these correlation matrices from the non-linear power spec- 
trum obtained, for example, by Peacock & Dodds (1996), given a 
cosmological model. Here, a simpler interpolation scheme is used. 
This scheme has the advantage of depending only on three tuning 
parameters: the assumed typical overall signal-to-noise ratio, u/n, 
and two typical lengths in the interpolation, L x and Lt- 



10 Sara Caucci et al. 



4.2 Choice of the parameters in the interpolation 

Each reconstruction is performed on a number /Vlos of LOSs ex- 
tracted at random from the simulation box. Since the distant ob- 
server approximation is implemented, all the LOSs are parallel. 
For a given value of /Vlos, the mean inter-LOS distance, (dLOs). 
reads: 

<d LOS ) = ^LlJN LO s. (11) 

This parameter obviously defines a natural scale in the reconstruc- 
tion: one cannot, intuitively, expect to reconstruct details of the dis- 
tribution at scales < (cZlos), at least in the direction orthogonal to 
the LOSs. 

The meaning of the parameters Lt and L x in Eq. (10) is then 
quite straightforward. The correlation lengths L — Lt and L — L x 
stabilize the inversion by ensuring the smoothness of the recon- 
struction. In order to avoid the formation of fictitious structures, the 
transverse correlation length must be of the order of the mean sepa- 
ration between the LOSs, Lt ~ (c^los) (we have chosen to take it 
exactly equal to (^los)), while the choice of the longitudinal cor- 
relation length depends on the problem considered. Since redshift 
distortion is not a concern in the present work, this parameter can 
be chosen to be of the order of the Jeans length in order to avoid 
information loss for small scales along the LOSs, here L x — 0.4 
Mpc. 

From a practical point of view, the variance parameter a of the 
correlation matrix fixes the relative contribution of signal to noise 
in Eq. (8), a/n. In our reconstruction, only ideal LOSs are con- 
sidered. Thus, strictly speaking, there is no instrumental noise or 
saturation effects. However, the inversion of the matrix Cdd + N 
is numerically unstable when N is set to zero, given the finite sam- 
pling and the degeneracy of the matrix, Eq (10), close to its diag- 
onal, (ii,xn) ~ (:r2,x 2 ±). In practice one has to "tune" the 
signal-to-noise ratio, a /n, to obtain the best compromise between 
numerical stability and "exactness" of the final reconstruction. This 
choice is ad-hoc: (a/n) 2 is the estimated variance o 2 (Lt, L x ) of 
the underlying field in a box of size Lt x Lt x L x . This is equiv- 
alent to assuming that, as the noise goes to zero, the inverse of the 
non-reduced second order correlation (in the appropriate units) is 
used, I + Cdd, instead of the reduced one, C^ D , to perform a 
stable reconstruction. 

In this work we estimate directly a (Lt, L x ) from the simu- 
lation. It is however important to note that o 2 (Lt,L x ) can in prin- 
ciple be derived from the LOSs alone by measuring the 1-D power- 
spectrum of phi. From this 1-D power-spectrum, one can indeed 
infer a 3-D power-spectrum with standard deconvolution methods 
and then an estimate of o 2 (Lt, L x ) by the appropriate integral on 
the 3-D power-spectrum. 

The measured values of o(Lt, L x ) are listed in Table 2. They 
are of the order unity: the assumed signal-to-noise ratio is about one 
in the regime considered here. Hence, in practice, the ad-hoc pro- 
cedure used to perform the inversion does not change significantly 
by including the contribution of the actual instrumental noise. How- 
ever, in this case the presence of the saturated regions in the Lyman- 
q spectra remains a problem. 

Finally, note that due to the large size of the matrices, re- 
constructions can only be performed by partioning the simulation 
box in blocks of smaller size, that contain iV s 3 ub grid points with 
iV su b = 32. The reconstruction is performed on each block indi- 
vidually. In order to avoid edge effects, neighboring patches are 
overlapped by adding a buffer region in which LOSs still con- 
tribute. In this way, the a priori correlation ensure continuity be- 



Atos 


Separation 


L/j 1 


a 




(arcmin) 


(Mpc) 




400 


1.33 


2 


1.12 


320 


1.49 


2.24 


1.17 


225 


1.77 


2.67 


1.23 


200 


1.88 


2.83 


1.25 


145 


2.2 


3.32 


1.29 


120 


2.42 


3.65 


1.31 


100 


2.65 


4 


1.34 



Table 2. Parameters used in the reconstructions performed in this paper. 
The longitudinal correlation length as been fixed to the value L x = 0.4 
Mpc for all the reconstructions. 

tween adjacent patches. The size of the buffer region is chosen to 
be n OV or — ILt (in grid pixel units), which implies a typical resid- 
ual contamination of edge effects due to the partitioning of less than 
2 percent. 

4.3 Bias in the reconstruction 

First note that the inversion is not directly performed on the density, 
but on its logarithm, i.e. the normalized density field is written as 
pHi = 1 + Shi = exp(7_r/i) and the field jhi is interpolated by 
using the method described above. This has two advantages: (i) it 
ensures the positivity of the reconstructed density; (ii) since the 
density turns out to be roughly log-normal 1 (see for example Bi & 
Davidsen, 1997; Choudhury, Padmanabhan & Srianand, 2001; Viel 
et al., 2002a; Zaroubi et al.; 2006, Desjacques et al., 2007; and see 
Coles & Jones, 1991, for the statistical properties of the log-normal 
distribution), performing the reconstruction on the logarithm of the 
field is expected to reproduce more realistic results as shown on 
Fig. 3. 

However, as a result of the reconstruction, the recovered 
field, will be smooth over anisotropic volumes of size 

~ L x x Lt x Lt, which means that at best, one can identify struc- 
tures at this level of smoothness on a logarithmic space. Although 
theoretical predictions (namely gravitational clustering, primordial 
non Gaussianities, etc. ) do exist for the density field itself, that is 
for pHi = exp(jHi) and its smoothed counterparts, they can not 
be applied directly in our case because smoothing and taking the 
exponential are operations which do not commute, except in the 
weakly nonlinear regime, 5 hi <§C 1. In particular, recovering the 
results for 8 hi on a linear space, by taking the exponential of 7ifi 
and subsequently smoothing it, an effective bias, essentially due to 
rare peaks in the jhi, roc field, is introduced. The effect of such a 
nonlinear bias is difficult to control and can in some cases be im- 
portant as shown below and studied in more details in Appendix A. 

4.4 Testing the reconstruction: statistical and topological 
analysis 

We now test the quality of the reconstruction using the same statis- 
tical tools as in Section 3, namely the PDF of the field and the Euler 
characteristic. Other statistics are considered, such as the variance 
and the skewness of the PDF, the power- spectrum of the density 

Note that if a field such lognormal, the (inverse of the) trans- 

formation (7) leaves the new field, e.g. pnu lognormal as well. 
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Figure 9. Qualitative comparison between the original Hi density field in terms of 7 = ln(l + S) (left column in each group of 4 panels) and the recovered 
one (right column in each group of 4 panels) in a thin slice (the thickness of the slice is 8 pixels, corresponding to 1.25 Mpc). Higher densities correspond to 
darker colors. The recovered field has been obtained by inverting a set of ./VloSs = 320 random LOSs (mean separation {o!los} = 2-24 Mpc) taken through 
the original (unsmoothed) density field. In each group of panels, the first row corresponds to a slice orthogonal to the LOSs, while the second row corresponds 
to a slice parallel to the LOSs. Upper group: the raw 7 fields for the original box and for the reconstruction. Lower left group: same as the upper group but 
after smoothing (in the logarithmic space) with a Gaussian window of radius L s = v^^los) = 3.17 Mpc. Lower right group: same as the lower left one, 
but smoothing is now applied directly to the density field, 1 + 6 = exp(7), instead of its logarithm and the normalization is slightly different (see Eq. Al of 
Appendix A). 



field and the filling factor of regions less dense than the minimum 
of the Euler characteristic. In addition, to have a quantitative esti- 
mate of the accuracy in the locus of the filamentary structures, we 
use the skeleton as introduced by Novikov, Colombi & Dore (2006) 
and by Sousbie et al. (2007) and define an inter-skeleton distance 
(ISD). 

Following the discussion in Section 4.3, the reconstruction is 
mainly tested on the field 7_hI and its smoothed counterparts. In 
Appendix A we provide additional results on the field 6m. 

Since there are two scales in the inversion (see Section 4.2), 



the recovered optical depth is an anisotropic smooth field with 
fewer structures in the direction transverse to the LOSs than in 
the direction parallel to them. Optimal comparison between recon- 
structed and real optical depth would require an approach based 
on anisotropic smoothing, a level of complexity well beyond the 
scope of this paper. Instead, to compare the reconstructions to 
the exact solution, an isotropic smoothing via a Gaussian win- 
dow is used (see Eq. 5). The width of the smoothing window is 
Ls ^ {d,Los) = Lt. The choice of the optimal smoothing scale is 
constrained by the inter-skeleton distance. 
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Figure 10. Statistics and topology in logarithmic space, i.e. in terms of 7^1 = ln(l + &hi), for three different reconstructions performed with A^OS = 320, 
Nlos = 200 and A^os = 120, from left to right respectively. First row of panels: the power-spectrum of 7^ as a function of wavenumber indicated at the 
top. In each panel, the thin dashed curve represents the power spectrum of the original field while the thick solid line is the power spectrum of the (unsmoothed) 
recovered field. The light shaded region corresponds to the scatter between five realizations of Gaussian fields (GRFs) with the same power spectrum as the 
reconstruction. The wavenumber k is expressed in unit of the inverse of the pixel size multiplied by 2tt, corresponding roughly to k ~ l/L(Mpc). Second 
row of panels: the probability distribution function as a function of ■yni = ln(l + Shi (as it is indicated at the bottom), after smoothing 7/J1 with a Gaussian 
window of size L s = \/2(<iLOs}- Solid thick and dashed thin lines correspond to the recovered and the original fields, respectively. The light shaded region 
in each panel represents the scatter derived from the five GRFs, while the big dots correspond to gaussian profiles with same mean and same variance as the 
smoothed recovered fields. Third row of panels: similarly as for the second row, but for the Euler characteristic. Fourth row of panels: similarly as for the third 
row, but for the individual critical point counts. In that case, the thick and thin lines correspond to the recovered and original fields, respectively. The solid, 
dashed, dotted and dot-dashed lines correspond respectively to minima, pancake saddle points, filament saddle points and maxima. 
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Figure 11. Left panel: Comparison between the skeletons of the original field (light lines) and its recovered counterpart (darker lines) (the skeletons represented 
here are the true ones, and not their local approximations, as defined in the main text). The original field was recovered by inverting -/Vlqs = 320 lines of 
sight, corresponding to a separation (^los) = 2.24 Mpc. Both skeletons are computed on fields smoothed over a scale L s = 3.16 Mpc, in logarithmic 
space. For clarity, only a 4 Mpc slice is shown, the background contour representing the original smoothed density field (lighter colors corresponding to higher 
densities). Right panel: Evolution of the inter-skeleton distance (ISD) between the original and reconstructed fields as a function of the number of line of sight 
A^os- The ISD is computed after smoothing over a scale L s = 3.65 Mpc which is roughly equivalent to the lowest resolution reconstruction sample. The 
upper (crosses) and lower (squares) curves represent the measured median distance from the reconstructed field skeleton to the original one and vice versa 
respectively, while the dotted curve represent their average value. 



One of the uncertainties in the reconstruction involves the 
determination of the mean value of the field ptrue = {"/Hi) = 
(In Phi), which can in principle be estimated only along the LOSs. 
To improve the quality of the reconstruction, its average is fixed to 

7ffl,rcc = 7ffl,rcc — (7f/l,rcc) + Mtruc- (12) 

In practice, the knowledge of flt TUB is expected to be accurate, 
even though its actual measured value, /ilos, is determined along 
the LOSs. For instance in the worse case considered in this work, 
N hOS = 100, ((plos - Mtruc) 2 ) 1/2 /Vtruo| ~ 1.91%, where the 
mean value of the difference between the measured and the real [i 
has been calculated by averaging over 100 different realizations of 
100 LOSs. 

4.4.1 Visual inspection 

A first qualitative comparison between the original and the recov- 
ered fields in logarithmic space can be made by examining Figure 9. 
The top panels illustrate the anisotropic nature of the reconstruc- 
tion. Smoothing at a scale L s > (dLOs) (for example, in the case of 
Fig. 9, L s = V / 2(rfLOs}), greatly improves the agreement between 
the reconstruction and the exact solution and the two field become 
almost indistinguishable (bottom left panels). When one examines 
in detail where the reconstruction fails, one notices that these struc- 
tures correspond to overdense regions. The fine nature of the web 
formed by overdense regions (filaments, clusters) makes the recon- 
struction more difficult for these regions than for the underdense 
ones because of the sparse sampling of the transverse structures. 

2 Note that the inversion formula, equation (8) can be amended to impose 
directly this constraint, following equation (11) of PVRCP, by including 
this information in Mo . 

3 When the analyses are performed in linear space, the normalization is 
different, as discussed in Appendix A. 



When going to linear space, i.e. taking the exponential of the 
fields and subsequently smoothing them, the effect caused by the 
amplification of rare events discussed in § 4.3 becomes obvious, as 
illustrated by the bottom right panels of Figure 9, that represent the 
counterpart of the bottom left panels in linear space. In logarithmic 
space the highest density peaks are highly depleted, here they are 
visible and spread over a beam the typical size of which is that of 
the smoothing window. 



4.4.2 Power-spectrum: the scales correctly reconstructed 

The good agreement between the original and the recovered fields 
in logarithmic space is confirmed by the first row of panels in 
Fig. 10, which shows the power-spectrum, P{k) — (|7k| 2 ), of 
the raw fields, jhi and 7£ri, rC c, f° r three reconstructions (Alos = 
320, 200, 120, corresponding respectively to (d LO s) = 2.24,2.83 
and 3.65 Mpc). We also show five realizations of a Gaussian ran- 
dom field (GRF) with the same P{k) as 7Hi,rec> in order to es- 
timate finite volume effects. As expected, the filtering nature of 
the reconstruction introduces an apodization effect on P{k) visi- 
ble on Fig. 10: a bending of P{k) is expected to happen roughly 
for k ~ fcbond = 2tr/L T , i.e., fc bcn d = 0.44, 0.35, 0.27 from 
the upper left to the upper right panel, respectively, in the units 
chosen in Fig. 10. It is not straightforward to check accurately this 
property by visual inspection. Indeed, when A r LOS decreases, the 
small k part of the reconstructed power-spectrum becomes less 
well correlated with the true P{k), giving the illusion, for exam- 
ple, that overall the A/los = 120 reconstruction does better than 
the A r LOS = 320 one. Still for k < fcbend, i-e-, L b >Lt = {dhos), 
there is a good agreement between the reconstruction and the exact 
solution. However, the measurement of the power-spectrum itself 
is not accurate enough neither does it contain enough information 
to guarantee that filaments are well reconstructed in detail, as we 
examine now. 
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Figure 12. Variance (top panel) and skewness (bottom panel) of 'yui = 
ln(l + 5^f,) for the original (open crosses), the recovered (filled crosses) 
fields and the Gaussian prediction (light dots), as functions of the LOS sep- 
aration, (c£los)- The symbols correspond to measurements performed on 
the 7 fields smoothed with a Gaussian window of size L s = \/2(rfLOs}- 
For a proxy of the errorbars, we used the measurements at smoothing scales 
L s = ((^LOs) an d = \/3(cfLOs}. except for the Gaussian fields, where 
the dispersion among the 5 realizations is a better choice. For the Gaussian 
case, the skewness should be exactly zero. The measurements are consis- 
tent with that value, despite the large dispersion at the largest scales, due to 
finite volume effects. 
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Figure 13. Same as in Fig. 12 but for the filling factor of underdense regions 
at the minimum of the Euler characteristic. The Gaussian limit, not shown 
here, would give a filling factor exactly equal to 0.5. 




Figure 14. Contours of underdense regions estimated from the minimum 
of the Euler characteristic in logarithmic space. The thick curves repre- 
sent the contours for two recovered fields 7i^i ircc obtained by the inver- 
sion of Wlos = 320 and Wlos = 200 lines of sight (solid and dashed 
lines respectively). Prior to contour determination, the recovered field was 
smoothed with a Gaussian window of size L s = \/2(dLOs}- These con- 
tours should be compared with those of the original field, 7ffi> represented 
with a thin line, smoothed at the same scales (solid and dashed lines respec- 
tively). This figure is complemented with the two upper panels of the lower 
left group shown in Fig. 9, where the same slices for the original and the 
recovered fields are displayed. 



Linear space: contours Sunderdense 
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Figure 15. Same as Fig. 14, but smoothing is performed in linear space. 
This figure is complemented with the two upper panels of the lower right 
group in Fig. 9, where the same slices for the original and the recovered 
fields are displayed. 



4.4.3 The skeleton: optimal smoothing scale 

The visual inspection of Figure 9 seems to show that the filamen- 
tary pattern of the overall three-dimensional distribution is well re- 
covered by the reconstruction in logarithmic space. One can check 
that assertion more quantitatively on the skeleton (Novikov et al., 
2006; Sousbie et al., 2007). This will allow us to define an opti- 
mal smoothing scale which will be used in the subsequent analy- 
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ses. More detailed analyses relying on the skeleton are postponed 
to another paper. 

The actual definition of the skeleton is in fact deeply related 
to the Euler characteristic since it relies on first principles of Morse 
theory: basically, the skeleton is composed of the set of field lines 
(the curves defined by the gradient of the density field) starting 
from the filament saddle points (J = 2 in the formalism described 
in § 2.1) and converging to local maxima. 4 Although apparently 
simple, solving this equation is quite difficult, this is why a local ap- 
proximation, based on a Taylor expansion around the critical points 
contained in the skeleton, was introduced in Novikov et al. (2006) 
and extended in 3D by Sousbie et al. (2007): the local skeleton. In 
this paper we use the implementation of Sousbie et al. (2007). 

Note that, as opposed to a global topological estimator such 
as the Euler characteristic, the skeleton provides a local test of the 
accuracy of reconstruction (i.e., one can check whether a precise fil- 
ament at a given location is recovered or not). Figure 1 1(a) presents 
the skeleton (yellow lines) of a 4 Mpc slice extracted from the orig- 
inal Hi density field, as well as the skeleton of its reconstructed 
counterpart (red lines), using 320 LOSs. Both fields are smoothed 
in logarithmic space using a gaussian window of scale L s = 3.16 
Mpc. This figure confirms that, on large scales, the general shape of 
the filamentary structures is well preserved, demonstrating the abil- 
ity of the reconstruction to recover the cosmic web. Nonetheless, as 
expected, some discrepancies appear on small scales. 

The inter- skeleton distance (ISD) is an estimator which allows 
to make a quantitative comparison. A skeleton corresponds to a 
number of small segments linked together to form the filaments. 
In order to estimate the average distance between two skeletons, A 
and B, for each segment of A, the distance to the closest segment in 
B is computed leading to the PDF of the distribution of the spatial 
separation from A to B. The distance from A to B is defined as the 
median of this PDF. Since this definition of distance is not sym- 
metrical (in the sense that ISD(A,B) and ISD(B,A) will, in general, 
be different), the mean distance between A and B is defined as the 
average of these quantities. Figure 1 1(b) presents the measurement 
of the ISD between the skeleton of the original field and its recon- 
struction, as a function of the number of LOSs used to perform the 
inversion (in units of the smoothing scale). In all cases, both fields 
were smoothed over a scale L s = 3.65 Mpc. What is important to 
notice here is the sharp transition at A^los = 225, corresponding 
to L s = Lcrit with 



1.35(dLOS>- 



(13) 



For a smoothing scale L s < L cr it, the match between reconstruc- 
tion and exact solution worsens suddenly, while no significant im- 
provement is really seen when L s > L CI - lt : L C rit represents some 
"optimal" smoothing scale, which is the smallest possible scale 
at which the reconstruction performs well, in terms of filamentary 
pattern recovery. Note that only the measurements for a particular 
value of L B are shown, but Eq. (13) should not change significantly 
for the scaling range considered in this work. 

Although all the subsequent analyses involving smoothing 
were performed at various scales, namely L\ = (g(los) 2 , 
2(o!los} 2 , 3(<iLOs} 2 , increasing likewise the number of LOSs con- 
tributing per smoothing volume, we shall, in light of the above 
findings, mainly concentrate on the results corresponding to L\ = 

2(rfLOs} 2 — icrif 



4 The actual conditions for this definition to be valid are discussed in 
Novikov et al. (2006). 



4.4.4 Statistical analysis 

The second row of Fig. 10 shows the PDFs of the smoothed coun- 
terparts of jhi (dashes) and jhutcc (solid), with a window of 
size L s — %/2(cZlos)> as argued just above. These measurements 
are supplemented with Fig. 12, which shows the variance and the 
skewness of the PDFs of various fields as functions of separation 
between the LOSs. Again, the agreement between the solid and 
dashed curves in second row of panels of Fig. 10 is quite good and 
the results do not depend significantly on the value of Alos- 

From a quantitative point of view, the difference between the 
recovered and the original curves can be calculated using the fol- 
lowing estimator, 



I one; 

\Vi 
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(14) 



where y° rls = y° Ilg (xi) and y\ ac = y ICC (xi) are the values of the 
curves relative to the original and the recovered fields respectively 
and the curves have been sampled at points Xi. This corresponds to 
the area between the curves, normalized by the area enclosed by the 
original ones. For the three reconstructions shown, the errors are of 
the order of errpDF = 10 — 20%. 

These quantitative estimates show that there are still some no- 
ticeable differences between the reconstruction and the true field: 
the shape of PDF of the reconstructed field, 7Hi,rcc, tends to be 
Gaussian, within the error range provided by the five Gaussian 
fields. This "Gaussianisation" is expected from both the central 
limit theorem and the shape of the correlation matrix given by 
Eq. (10). Note that this statement is not totally consistent with the 
measurement of the skewness (lower panel of Fig. 12), especially 
at intermediate separations between the LOSs. However, this skew- 
ness is quite sensitive to the upper tail of the PDF corresponding to 
rare events in overdense regions: one expects, in that regime, de- 
viations from Gaussianity in the reconstruction because the central 
limit is not yet reached. 

The true field, ~/hi, deviates slightly from a Gaussian, as al- 
ready shown in Fig. 3. In particular, in the right part of the bell 
shape of the PDF on Fig. 10, there is a slight disagreement between 
the dashed and the continuous curves, which corresponds to the 
weak negative skewness measured in lower panel of Fig. 12. This 
disagreement would be even more visible if a logarithmic represen- 
tation were used on the y axis to display the PDF: the high density 
tail of the Hi field is far from lognormal. The main contribution 
to such a tail comes from collapsed objects in clusters and in fil- 
aments. As argued in § 4.4.1, these objects are sparsely sampled 
by the LOSs, which worsens the quality of the reconstruction in 
overdense regions. 



4.4.5 Global topology 

The nearly Gaussian nature of the reconstructed 7 field can be also 
confirmed by examining the third row of panels in Fig. 10, which 
is similar to the second row, but displays the Euler characteristic 
as a function of the density threshold. Deviations from Gaussianity 
of the true field, jhi, are more clearly visible than for the PDF. In 
particular, on all the panels, the corresponding dashed curve always 
presents an asymmetry between its two maxima, contrary to what is 
observed in the Gaussian limit. The reconstruction, 7iji jre c, being 
more symmetrical, is clearly closer to the Gaussian limit than the 
true field. However, as noticed earlier for the skewness of the PDF, 
one cannot really claim that the reconstruction is fully Gaussian: 
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deviations outside the range allowed by our five Gaussian realiza- 
tions are noticeable, particularly in the right panel and in general in 
the overdense right tail (7 > — 1.5) of x + ■ Still the overall topology 
of the reconstructed field, although closer to the Gaussian limit, re- 
produces rather well the topology of the true field, especially in the 
range — 2.0 <7< — 1.5, which confirms the findings of § 4.4.3 
on the skeleton. This density regime is indeed dominated by fila- 
ments saddle points and local maxima, as shown by the last row of 
panels of Fig. 10, which displays the different critical point counts 
as functions of the density threshold for 7^1 and 7Hi,rcc ■ Note the 
increasing contribution of the noise when TVlos decreases, which 
makes the agreement between reconstruction and exact solution 
worse, particularly for large densities, as expected. From a more 
quantitative point of view, one can, similarly as for the PDF, com- 
pute the integrated errors on the critical point counts (Eq. 14). For 
the 3 reconstructions we consider here, these errors are of the same 
order as for the PDF (i.e. less than 20%). 

As an additional test, the minimum of the Euler character- 
istic, 7min ~ —2, can be used to define a topological bound- 
ary between "underdense" and "overdense" regions. Indeed in the 
Gaussian limit, this minimum lies exactly at 7 m i n = (7}. Defin- 
ing the filling factor of underdense regions, FF undordc nsc, as the 
fraction of space occupied by points verifying 7 < 7 m i n , one ex- 
pects -F-Funderdense to be always close to 0.5: at least this is true 
for any monotonic local transform of a Gaussian field (with no ad- 
ditional smoothing). Even though the reconstruction and the true 
field do not have exactly the same behavior for x + , they seem to 
have very close values of 7 m in, which should correspond to a good 
agreement between the measured values of FF un d C rdcns C : this is 
indeed the case as shown in Fig. 13. Although the measured val- 
ues of FFundcrdcnso are consistent with those of the original ones, 
the connectivity of the underdense (or equivalently, overdense) re- 
gions defined in this way is good but not perfect, as illustrated by 
Fig. 14. In this range of densities, connectivity of the excursion is 
controlled equally by filament and pancake saddle points and their 
respective counts tend to be slightly underestimated and overesti- 
mated, respectively, as illustrated by bottom panels of Fig. 10. This 
is however not enough to explain the discrepancies in Fig. 13, and 
shows the limits of global topological estimators. 

At the qualitative level, note finally that the situation becomes 
worse when one attempts to recover the boundary contour between 
overdense and underdense regions in linear space, because of the 
bias mentioned in § 4.3. This is shown in Figure 15, which repre- 
sents the same slice as in Figure 14 but in this case the contours 
were calculated from the minimum of the Euler characteristic after 
smoothing the exponential of the fields. Here, the position of the 
structures in the recovered contours is significantly different from 
that of the original fields, not to mention connectivity. Appendix A, 
which discusses a figure similar to Fig. 10 but in linear space, fully 
confirms these results. 



5 DISCUSSION AND CONCLUSION 

In this paper we have studied the topology of large scale structures 
as traced by the intergalactic medium (IGM) in a hydrodynamical 
cosmological simulation. The main goal was to test a reconstruc- 
tion method (PVRCP) of the three-dimensional large scale matter 
distribution from multiple lines of sights (LOSs) towards quasars. 
For this purpose, we relied on a number of global statistical tools, 
the probability distribution function of density field (PDF), the Eu- 
ler characteristic (x + ) as an alternate critical point counts and re- 



lated quantities such as the variance and the skewness of the PDF, 
the individual critical points counts and the filling factor of the un- 
derdense regions at the minimum of the Euler characteristic. We 
also used the skeleton as local probe of the geometry and the topol- 
ogy of the field. The main results of our investigations can be sum- 
marized as follows 

• In the first part of this paper we addressed the problem of re- 
lating the topology of the dark matter density field to the topology 
of the distribution traced by the total amount of gas and the neutral 
gas (Hi). When one considers the Hi density distribution at scales 
larger than the Jeans length of the gas and takes into account the 
IGM equation of state relating the neutral and total amount of gas, 
then the properties of this nearly lognormal distribution are exactly 
the same as found for the dark matter/total gas in underdense re- 
gions (i.e. for density contrasts S < 0). For larger density contrasts, 
some deviations appear, due to shocks (where Hi is depleted) and to 
the presence in filaments and clusters of highly condensed objects 
(where Hi is very concentrated). Taking these results into account, 
with the additional assumption that instrumental noise, in particular 
effects of saturation, can be neglected, we have shown that studying 
the topological properties of large scale matter density distribution 
is equivalent to studying directly those of the optical depth or in 
what follows, those of neutral gas, Hi. 

• In the second part of this work we tested the Wiener interpo- 
lation proposed by PVRCP to recover the three-dimensional distri- 
bution of Hi from a set of multiple LOSs, along which the (one- 
dimensional) distribution of Hi is assumed to be known exactly. 
This interpolation depends on three parameters, the typical size, 
L x of structures along the LOSs, the typical mean LOSs separa- 
tion, Lt — (d-Los), and the expected variance of the fluctuations 
of the field which can be in principle indirectly inferred from the 
LOSs themselves. 

Our investigation shows that the reconstruction method can be 
used to predict quite accurately the patterns in the large scale mat- 
ter distribution at scales of the order of ~ 1.4(cJlos) or larger when 
one attempts to recover the logarithm of the density field. In par- 
ticular it allows us to recover the position of filaments in the large 
scale distribution: we compared the skeleton of the initial and re- 
covered field and measured the distance between these skeletons 
and found that for smoothing scales larger than ~ 1.4(o!los}, the 
inter- skeleton distance remains smaller than (c£los}- Furthermore, 
the global shape of the PDF, of the fraction of critical points and 
of the Euler characteristic are well reproduced, the integral errors 
on these quantities varying in the range 10-20%. Discrepancies be- 
tween the reconstruction and the exact solution are mainly found 
in overdense regions, where deviations from a lognormal behavior 
are the most significant. 

The good recovery of the statistical properties of the density 
field in logarithmic space, is strongly related to the Gaussian prior 
on which the inversion method is based. Recall that, since the dis- 
tribution of the gas density is very close to log-normal, the distri- 
bution of its logarithm is well approximated by a Gaussian func- 
tion. As demonstrated in PVRCP, the Wiener interpolation is just 
a special case of the maximum likelihood method. It gives, under 
the hypothesis that the statistical distributions of the data and of 
the parameters are Gaussian, the optimal reconstruction for a lin- 
ear model. However, this relies on a proper knowledge of the co- 
variances matrices. Here we assume a simple functional shape for 
these matrices, given by Equation (10). A better treatment would 
need an accurate knowledge of the underlying power-spectrum of 
the logarithm of the density. The interpolation could for instance be 
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improved by using a stronger prior relying on the extension of e.g. 
the nonlinear ansatz of Hamilton et al. (1991) to logarithmic space. 

We noticed that some deviations are present in the original 
field, compared to the log-normal limit at the scales we have probed 
here. This information could be added to the model. This could be 
achieved by applying an Edgeworth expansion to the logarithm of 
the field (Juszkiewicz et al. 1995; Colombi 1994), hence by tak- 
ing into account slight deviations of the likelyhood function from 
a Gaussian distribution to correct our Wiener interpolator (Amen- 
dola, 1998). 

Even though the best variable for the reconstruction is the log- 
arithm of the density, theoretical predictions are usually performed 
on the density itself. Therefore it is in practice difficult to compare 
the properties of the reconstructed density distribution to those pre- 
dicted by e.g. nonlinear perturbation theory (e.g. Bernardeau et al. 
2002) or other models. The problem is that linear space gives more 
emphasis to rare events in overdense regions. In Appendix A we 
analyse the corresponding bias on the reconstruction, and find that 
it is critical for the higher density peaks. As a result, the tomog- 
raphy is in practice much less robust when expressed directly in 
linear space. However, this is mainly related to the fact that our 
analyses are performed at scales smaller or of the order of 4 Mpc, 
where non-linear effects in the dynamics are still present. 

Due to the size of our simulation (Lb ox = 40 Mpc), in this 
work we have analysed the properties of connectivity at relatively 
small scales (L s — 4 Mpc), where the distribution of matter is 
close to log-normal 5 . However, one could in principle extend the 
analyses to larger scales, to probe the linear or quasi-linear regime, 
where the density distribution is actually close to Gaussian. In that 
case, the reconstruction should be performed on the density itself 
rather than on its logarithm while the above mentioned problems 
would become irrelevant. In particular, the implementation of the 
improvements on the Wiener interpolator could for instance be used 
to test directly if non trivial deviations from Gaussianity are present 
or not in the data. If present, they could be ascribed to primor- 
dial non-gaussian features that are produced during the inflationary 
phase or as a result of topological defects. 

The inversion method is based on the hypothesis that a suf- 
ficiently strong correlation exists at the scale under consideration. 
Indeed, various sources of noise can hide such a correlation com- 
pletely (errors due to the finite cosmological volume probed by a 
finite number of LOSs, noise in the measurement of the spectra), 
making the reconstruction irrelevant. To test the strength of the 
correlation a large number of quasar pairs spanning the range of 
separations we want to probe must be observed. It has been re- 
cently shown (Coppolani et al. 2006) that at z ~ 2 for a separation 
of ~ 5 arcmin (corresponding to w 7.6 Mpc for a flat universe 
with ft m = 0.3, = 0.7 and H = 70kms~ 1 Mpc -1 ), some 
correlation is observed, suggesting that the inversion method could 
be applied at such scales. It is thus very important to measure more 
accurately the transverse correlation function from quasar pairs. In- 
deed, once this is done, we can include this information as a self- 
consistent prior in the reconstruction procedure. 

Using realistic data about the luminosity function of quasars 
(Jiang et al. 2006), it is found that for magnitude limits of 
g ~ (23, 24, 25) the number of quasars observed per square degree 
at z iS 2 is nQsos = (41, 77, 136) respectively. For the set of cos- 



5 Note that, because of the small size of our simulation, we could not 
really examine the effects of cosmic variance, except with our Gaussian 
realizations. 



Magnitude limit 


Separation (QSOs) 


Separation (QSOs et LBGs) 


9 


(arcmin) 


(arcmin) 


23 


9.4 


9.3 


24 


6.8 


4.3 


25 


5.2 


1.2 



Table 3. Mean angular separation between the background sources as a 
function of the magnitude limit (left column). 

mological parameters assumed here, the corresponding mean angu- 
lar separations are (cZlos) = (9.37, 6.84, 5.15) arcmin. Moreover, 
for g iS 23 the number density of Lyman-break galaxies (LBGs) 
starts to become significant and we can think of using these ob- 
jects as background sources in combination with QSOs. In par- 
ticular, it is found that for g i; (23, 24, 25) the number of LBGs 
par square degree is tilbGs = (0.3, 116, 2325) respectively (Adel- 
berger & Steidel 2000), so that, even at g < 24, the number of avail- 
able sources is largely increased. In Table 3 we display the mean 
separation one can expect as a function of the magnitude limit. 

One can see that if we are able to observe objects up to a mag- 
nitude limit of g ~ 24, the density of background sources will 
be high enough to perform a reconstruction similar to what de- 
scribed in this paper. A better approach will be to search for pe- 
culiar fields in which the density is larger by chance (e.g. Petitjean 
1997). The spectral resolution will be a decreasing function of the 
magnitude. Observational difficulties will include the contamina- 
tion of the LBG spectrum by absorption lines originating in the in- 
terstellar medium of the galaxy and the fact that the mean redshift 
(z « 2.8) will be larger than what we have considered in this paper. 
To reach these faint magnitudes we need to wait for the advent of 
the Extremely Large Telescopes (Theuns & Srianand 2006). 

To conclude, the approach developed here is very promising as 
the advent of Extremely Large Telescopes will boost this field by 
allowing the observation of a number of background sources large 
enough to probe the distribution of the matter with accurate preci- 
sion at the scales under consideration. The total amount of observ- 
ing time will be large however but worthwhile given the expected 
results foreseen in this paper. 



ACKNOWLEDGEMENTS 

We thank D. Pogosyan for providing us with his calculations of 
critical count numbers predicted in the Gaussian limit, as dis- 
played as smooth curves on Fig. 2. We thank S. Prunet, R. Teyssier 
and D. Weinberg for stimulating discussions and D. Munro for 
freely distributing his Yorick programming language and opengl in- 
terface (available at http : / /yorick . sourcef orge . net/). 
This project was partially performed as a task of the HORIZON 
project (www.projet-horizon.fr). The hydrodynamical simulations 
were run on the NEC-SX5 of the Institut du Developpement et des 
Ressources en Informatique Scientifique (IDRIS) in Orsay. 



REFERENCES 

Adelberger, K. L., Steidel, C. C, 2000, ApJ, 544, 218 
Amendola, L., 1998, astro-ph/9810198 

Aracil, B., Petitjean, P., Smette, A., Surdej, J., Miicket, J.P, Cristiani, S., 
2002, A&A, 391, 1 



1 8 Sara Caucci et al. 



Bardeen, J. M, Bond, J. R., Kaiser, N., Szalay, A. S., 1986, ApJ 304, 15 
Bemardeau, R, Colombi S., Gaztanaga, E., Scoccimairo, R., 2002, Physics 

Reports 367, 1 
Bi, H. G., Davidsen, A. R, 1997, ApJ, 479, 523 

Cen, R., Miralda-Escude, J., Ostriker, J., P., Rauch, M., 1994, ApJ, 437, L9 
Choudhury, T. R., Padmanabhan, T., Srianand, R., 2001, MNRAS, 322, 561 
Coles, P., Jones, B.,1991, MNRAS, 248, 1 
Colombi, S., 1994, ApJ, 435, 536 

Colombi, S., Bouchet, R R., Schaeffer, R., 1994, A&A, 281, 301 
Colombi, S., Pogosyan, D., Souradeep, T., 2000, PhRvL, 85, 5515 (CPS) 
Coppolani, R, Petitjean, P., Stoehr, R, Rollinde, E., Pichon, C, Colombi, 
S., Haehnelt, M., Carswell, B., Teyssier R., 2006, MNRAS, 370, 1804 
Croft, R.A.C., Weinberg, D.H., Katz, N., Hernquist, L., 1998, ApJ, 495,44 
Crotts, A.P.J., Fang, Y, 1998, ApJ, 502, 16 
Dave et al. 2001, ApJ, 552, 473 

Desjacques, V„ Nusser, A., Sheth, R. K., 2007 MNRAS, 374, 206 

Doroshkevich, A. G. 1970, Astrophysics, 6, 320 

Gnedin, N. Y., Hui, L., 1998, MNRAS, 296, 44 

Gott in, J. R., Melott, A. L., Dickinson, M. 1986, ApJ, 306, 341 

Gott III, J. R., Weinberg, D. H., Melott, A. L., 1987, ApJ, 319, 1 

Guimares, R., Petitjean, P., Rollinde, E., de Carvalho, R. R., Djorgovski, S. 

G, Srianand, R., Aghaee, A., Castro, S., 2007, MNRAS, 377, 657 
Gunn, J. E., Peterson, B. A. 1965, ApJ, 142, 1633 
Guzzo, L. 2001, astro-ph/0 102062 

Hamilton, A. J. S.; Kumar, P.; Lu, Edward; Matthews, Alex 1991, ApJ, 374, 
1L 

Hui, L., Stebbins, A., Buries, S. 1999, ApJ, 511, L5 

Jiang, L., Fan, X., Cool, R. J., Eisenstein, D. J., Zehavi, I., Richards, G. T., 
Scranton, R., Johnston, D., Strauss, M. A., Schneider, D. P., Brinkmann, 
J., 2006, AJ, 131,2788 

Juszkiewicz, R., Weinberg, D. H., Amsterdamski, P., Chodorowski, M., 
Bouchet, R, 1995, ApJ, 442, 39 

Kaiser, N. 1984, ApJ, 284, L9 

Kaiser, N. 1987, MNRAS, 227, 1 

McDonald, P., Miralda-Escude, J. 1999, ApJ, 518, 24 

ecke, K. R., Buchert, T., Wagner, H, 1994, A&A 288, 697 

Milnor, 1963, Morse Theory p.29 

Miralda-Escude, J., Cen, R., Ostriker, J. P., Rauch, M., 1996, ApJ, 417, 582 
Mucket, J. P., Petitjean, P., Kates, R. E., Riediger, R. 1996, A&A, 308, 17 
Nakagami, T., Matsubara, T., Schmalzing, J., Jing, Y, 2004, astro- 
ph/0408428 

Novikov, D., Colombi, S., Dore, O., 2006, MNRAS, 366, 1201 
Nusser, A., Haehnelt, M., 1999, MNRAS, 303, 179 

Park, C, Choi, Y, Vogeley, M. S., Gott III, J. R., Kim, J., Hikage, C, Mat- 
subara, T., Park, M., Suto, Y, Weinberg, D. H, 2005, ApJ, 633, 1 1 
Peacock, J. A., Dodds S. J., 1996, MNRAS, 280, L19 
Petitjean, P., Mucket, J. P., Kates, R. E., 1995, A&A, 295, L9 
Petitjean, P., 1997, euvl.conf, p266, (arXiv:astro-ph/96081 15) 
Petitjean, P., Surdej, J., Smette, A., Shaver, P., Mucket, J., Remy, M. 1998, 
A&A, 334, L45 

Pichon, C, Vergely, J. L., Rollinde, E., Colombi, S., Petitjean, P., 2001, 

MNRAS, 326, 597 (PVRCP) 
Protogeros, Z. A. M., Weinberg, D. H, 1997, ApJ, 489, 457 
Rauch, M., 1998, ARA& A, 36, 267 
Rollinde, E., Petitjean, P., Pichon, C, 2001, A&A, 376, 28 
Rollinde, E., Petitjean, P., Pichon, C, Colombi, S., Aracil, B., D'Odorico, 

V., Haehnelt, M. G. 2003, MNRAS, 341, 1299 
Schmalzing, J., Buchert, T., 1997, ApJ, 482, LI 

Sousbie, T., Pichon, C, Courtois, H, Colombi, S., Novikov, D., 2006, sub- 
mitted to ApJLett. (arXiv:astro-ph/0602628) 

Sousbie, T., Pichon, C, Colombi, S., Novikov, D., Pogosyan, D. 2007, sub- 
mitted to MNRAS (arXiv:astro-ph/0707.3123) 

Spergel, D. N, Bean, R., Dore, O., Nolta, M. R., Bennett, C. L., Dunkley, J., 
Hinshaw, G., Jarosik, N., Komatsu, E., Page, L., Peiris, H. V., Verde, L., 
Halpem, M., Hill, R. S., Kogut, A., Limon, M., Meyer, S. S., Odegard, 
N., Tucker, G. S., Weiland, J. L., Wollack, E., Wright, E. L., 2007, 
ApJS, 170, 377 




Linear space 
2.5 3.0 3.5 



4.0 



* Original 

+ Recovered 

• Lognormal 



I J 



2.5 3.0 3.5 

Separation [Mpc] 



4.0 



Figure A2. Same as Figure 12, but in linear space, as explained in caption 
of Fig. Al. 
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APPENDIX A: RECOVERED FIELD: ANALYSIS IN 
LINEAR SPACE 

While the reconstruction seems to perform well for 7 = ln(/o) and 
its smoothed counterparts (except that it is somewhat "Gaussian- 
ized", as shown by the measurements in the main text), let us now 
investigate what happens for the statistical properties of the field 
itself p — exp(7). 

It was noted in that case (§ 4.3) that the recovered field is ex- 
pected to be biased, originating from the fact that taking the expo- 
nential of a field does not commute with smoothing via a Gaus- 
sian window. Furthermore, taking the exponential gives empha- 
sis to high density peaks, which are the most poorly reconstructed 
(§ 4.4). Additional smoothing contaminates neighboring pixels as 
well, resulting in significant changes in the connectivity. These ef- 
fects were confirmed at the qualitative level in the main text by vi- 
sual inspection of Figs. 9, 14 and 15. We now examine them more 
quantitatively. 

As just argued above, since we are working in linear space, 
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Figure Al. Same as in Fig. 10, but in the linear space, i.e. by taking the exponential of the recovered fields and the Gaussian realizations along with 
normalization Al, with subsequent smoothing with a Gaussian of size L s = v^2((2los) f° r the last three row of panels. The big dots on the second row of 
panels now correspond to a lognormal distribution with same variance and average as the reconstruction. 



p ~ exp(7), rare events in overdense regions (which are poorly re- 
constructed) dominate. As a consequence, the reconstruction fails 
with respect to the mean density: equation (12) is clearly not ap- 
propriate anymore to normalize the reconstruction. Instead, the re- 
constructed density, pHi,iec, is renormalized as follows: 



Phi,; 



= {PHl) 



T>(7f 



(exp(7ffr,rec)) ' 



(Al) 



where (pm) is the true mean density in the simulation. Note that 
this density is no longer accurately determined from direct mea- 
surements on the LOSs: in the worse case considered in this paper, 
•Nlos = 100, we indeed find a relative error on the estimate of 



20 Sara Caucci et al. 

Linear space 

0.60 



0.55 



o 
to 



0.50 



0.45 



0.40 



J I I I I I I I I I I I I I I I I I I I I I I l_ 

^ Original 



+ Recovered 




I 



"i i i i i i i i i i i i i i i i i i i i i + 1 i 



2.0 2.5 3.0 3.5 4.0 

Separation [Mpc] 

Figure A3. Same as Fig. 13, but in lineal' space, as explained in caption of 
Fig. Al. 



(pHi) of the order of 30%. However, the simulation volume is quite 
small, leading to unrealistically short LOSs. In real observations the 
determination of the average neutral gas density along LOSs should 
be much more accurate (Guimares et al. 2007). 

The choice of the normalization given by Eq. (Al) is natural 
since it imposes the average density of the reconstructed field to be 
equal to that of the exact solution. However, because it is still af- 
fected by overdense regions contributions, this normalization is not 
fully satisfactory as it does not lead to the appropriate corrections 
in underdense regions, as can be noted by a careful examination of 
4 lower right panels of Fig. 9. 

The contamination by high density peaks affects all statistics, 
as illustrated by Figs. Al and A2. This is particularly dramatic for 
second order statistics (upper row of Fig. Al and upper panel of 
Fig. A2). The reconstruction underestimates the normalization of 
the power-spectrum, and as a result the variance of the PDF, espe- 
cially when the separation between the LOSs is small: in the latter 
case, nonlinear features in the density field are given more weight 
and are poorly captured by the reconstruction. This appears as a 
shift in the PDF shown in the second row of panels in Fig. Al, 
worsening with increasing TVlos- Note however that the agree- 
ment between the reconstruction and the exact solution, although 
poorer than in logarithmic space, improves when A^os )5 200. 
Note also that the smoothed lognormal fields no longer match the 
reconstruction. In fact, in the linear space, its seems that the recon- 
struction gives a solution intermediate between the exact one and 
the smoothed lognormal fields, both from the point of view of the 
power-spectrum and the PDF (and its cumulants) (Fig. A2): it cap- 
tures more than just the Gaussian features of the logarithm of the 
real solution, as would have naively followed from the analysis of 
§ 4.4. 

These results are confirmed in the third row of panels in 
Fig. Al: the measured Euler characteristic of the reconstruction 
gives an intermediate solution between the true and the lognor- 
mal solution (see for instance the position of the local extrema of 
the curves representing x + )- Note that overall, the reconstruction 
matches better the lognormal behaviour than the true solution, espe- 
cially when A^los is large, implying that "lognormalization" dom- 



inates, at least from a topological point of view, while nonlinear 
dynamics implies significant departures from a purely lognormal 
behavior. This explains again why the quality of the reconstruction 
decreases when attempting to probe the smallest scales. Note that 
this does not mean that decreasing the number of LOSs is better: the 
analysis always looks at the smallest scale recoverable in logarith- 
mic space, ~ 1.4(dLOs)- 6 At fixed smoothing scale, a reconstruc- 
tion with a given number of LOSs does better than a reconstruction 
with sparser LOS sampling. Still, note that the reconstruction does 
more than a simple "lognormalization" as it gives an intermediary 
answer between the expected lognormal behavior from the analysis 
in logarithmic space and the true solution, at least from the point 
of view of the PDF and the Euler number. The uncertainties in the 
measurements due to the emphasis put on rare events are however 
too large to drive definite conclusions with a small sample of LOS: 
the spread between the five lognormal fields is much larger than 
they were in the logarithmic space (and similarly for the PDF). 

Let us finally check the global topological properties of the 
reconstruction by examining the number counts of each kind of 
critical points individually, as shown in the last row of panels in 
Figure Al. Notwithstanding all the above points, note that the in- 
version achieves a fair reconstruction of the distribution of some of 
the critical points: in the low density regime, it overestimates the lo- 
cal minima count, as expected from visual inspection of four lower 
right panels of Fig. 9 and from the PDF: the reconstructed field 
in underdense region is overestimated. In the intermediate density 
range, reconstruction overestimates pancake saddle point counts 
(and to a lesser extent, underestimates filaments saddle point and 
local maxima counts) for /Vlos = 320 while larger separations 
between LOSs do better. In the overdense regime, where the recon- 
struction fails more dramatically, and where the amplification of the 
errors is large, one tends to overestimate (underestimate) filament 
saddle points (local maxima). 

Still, it is interesting to note that the local minimum of the Eu- 
ler number, p m in ~ 0.7 is comparable for the reconstruction and 
the exact solution, suggesting that the measured filling factor de- 
fined previously will be similar for the reconstruction and the exact 
solution: according to Figure A3, the filling factor of underdense 
regions at the minimum of the Euler number does nearly as well as 
in logarithmic space, but the match between its isocontours is worse 
than before (compare Figure 14 with Figure 15): thus, even if the 
critical point counts and the fraction of underdense regions agree, 
this does not necessarily imply that the structures, in particular the 
densest ones, are at the right position. 



6 We did not examine the skeleton in linear space to find the best smoothing 
scale in that case. 



